First Experiments on an Hmm Based Double Layer Framework for Automatic Continuous Speech Recognition

نویسندگان

  • Albino Nogueiras
  • Marta Casar
  • José A. R. Fonollosa
  • Mónica Caballero
چکیده

The usual approach to automatic continuous speech recognition is what can be called the acoustic-phonetic modelling approach. In this approach, voice is considered to hold two different kinds of information—acoustic and phonetic—. Acoustic information is represented by some kind of feature extraction out of the voice signal, and phonetic information is extracted from the vocabulary of the task by means of a lexicon or some other procedure. The main assumption in this approach is that models can be constructed that capture the correlation existing between both kinds of information. The main limitation of acoustic-phonetic modelling in speech recognition is its poor treatment of the variability present both in the phonetic level and the acoustic one. In this paper, we propose the use of a slightly modified framework where the usual acoustic-phonetic modelling is divided into two different layers: one closer to the voice signal, and the other closer to the phonetics of the sentence. By doing so we expect an improvement of the modelling accuracy, as well as a better management of acoustic and phonetic variability. Experiments carried out so far, using a very simplified version of the proposed framework, show a significant improvement in the recognition of a large vocabulary continuous speech task, and represent a promising start point for future research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت

The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...

متن کامل

Analysis of HMM temporal evolution for automatic speech recognition and utterance verification

This paper proposes a double layer speech recognition and utterance verification system based on the analysis of the temporal evolution of HMM’s state scores. For the lower layer, it uses standard HMM-based acoustic modeling, followed by a Viterbi grammarfree decoding step which provides us with the state scores of the acoustic models. In the second layer, these state scores are added to the re...

متن کامل

Analysis of HMM Temporal Evolution for Automatic Speech Recognition and Verification

This paper proposes a double layer speech recognition and utterance verification system based on the analysis of the temporal evolution of HMM’s state scores. For the lower layer, it uses standard HMM-based acoustic modeling, followed by a Viterbi grammar-free decoding step which provides us with the state scores of the acoustic models. In the second layer, these state scores are added to the r...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006